Robust Modified Policy Iteration

نویسندگان

David L. Kaufman

Andrew J. Schaefer

چکیده

Robust dynamic programming (robust DP) mitigates the effects of ambiguity in transition probabilities on the solutions of Markov decision problems. We consider the computation of robust DP solutions for discrete-stage, infinite-horizon, discounted problems with finite state and action spaces. We present robust modified policy iteration (RMPI) and demonstrate its convergence. RMPI encompasses both of the previously known algorithms, robust value iteration and robust policy iteration. In addition to proposing exact RMPI, in which the “inner problem” is solved precisely, we propose inexact RMPI, in which the inner problem is solved to within a specified tolerance. We also introduce new stopping criteria based on the span seminorm. Finally, we demonstrate through some numerical studies that RMPI can significantly reduce computation time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-markov Decision Processes

This paper deals with computational algorithms for obtaining the optimal stationary policy and the minimum cost of a discounted semi-Markov decision process. Van Nunen [23) has proposed a modified policy iteration algorithm with a suboptimality test of MacQueen type, where the modified policy iteration algorithm is policy iteration method with the policy evaluation routine by a finite number of...

متن کامل

Non-Stationary Approximate Modified Policy Iteration

We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error at each iteration is known to lead to stationary policies that are at least 2γ (1−γ)2 -optimal. Variations of Value and Policy Iteration, that ...

متن کامل

Modified Halpern Iteration of Asymptotically Non-Expansive Mappings

متن کامل

Learning Robust Options

Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Po...

متن کامل

Accelerating of Modified Policy Iteration in Probabilistic Model Checking

Markov Decision Processes (MDPs) are used to model both non-deterministic and probabilistic systems. Probabilistic model checking is an approach for verifying quantitative properties of probabilistic systems that are modeled by MDPs. Value and Policy Iteration and modified version of them are well-known approaches for computing a wide range of probabilistic properties. This paper tries to impro...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

INFORMS Journal on Computing

دوره 25 شماره

صفحات -

تاریخ انتشار 2013

Robust Modified Policy Iteration

نویسندگان

چکیده

منابع مشابه

A Unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-markov Decision Processes

Non-Stationary Approximate Modified Policy Iteration

Modified Halpern Iteration of Asymptotically Non-Expansive Mappings

Learning Robust Options

Accelerating of Modified Policy Iteration in Probabilistic Model Checking

عنوان ژورنال:

اشتراک گذاری